transfer task
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
--Music editing has emerged as an important and practical area of artificial intelligence, with applications ranging from video game and film music production to personalizing existing tracks according to user preferences. However, existing models face significant limitations, such as being restricted to editing synthesized music generated by their own models, requiring highly precise prompts, or necessitating task-specific retraining--thus lacking true zero-shot capability. Experimental results demonstrate that our approach outperforms existing methods in preserving musical content, structural consistency, and editing fidelity, establishing a strong foundation for controllable music editing in real-world scenarios. The landscape of audio generation has shifted dramatically in recent years. Text-to-music systems now allow users to compose entire musical pieces from simple textual descriptions, powered by advances in diffusion models and transformer architectures [1]-[11]. While impressive, these systems are still primarily designed for creation from scratch . In contrast, real-world music practice often revolves around editing: refining a performance, altering instrumentation, or adapting an existing recording into a new style. For musicians, producers, and casual creators alike, the ability to reshape existing audio is often more valuable than generating entirely new material. Music editing, however, is fundamentally more difficult than generation. It requires the model to balance two competing goals: applying the requested modification faithfully, and preserving the rich details of the input recording that should remain unchanged. This trade-off is especially challenging when dealing with expressive, polyphonic, or multi-instrumental recordings. Existing research has attempted to address editing through supervised datasets of paired "before" and "after" examples [12]-[14], or through zero-shot latent manipulations in diffusion models [15]-[17]. Y et most methods remain restricted by their limitation to specific editing tasks, operate mainly on model-generated music rather than arbitrary recordings, and often require very precise prompts to succeed [15], [17].
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
- Leisure & Entertainment (0.68)
- Education (0.68)
Global-focal Adaptation with Information Separation for Noise-robust Transfer Fault Diagnosis
Ren, Junyu, Gan, Wensheng, Zhang, Guangyu, Zhong, Wei, Yu, Philip S.
Rotating machinery [1] is critical in industrial applications, where system reliability is essential to avoid financial losses and safety risks. Therefore, timely fault diagnosis is a crucial engineering priority. Deep learning-based fault diagnosis has achieved remarkable success due to its ability to extract features and model complex nonlinear relationships [2, 3]. However, industrial rotating machines operate under diverse conditions, leading to domain shifts that degrade the diagnostic performance of conventional deep learning methods [4]. Among the powerful artificial intelligence (AI) technologies, transfer learning [5] can address these limitations through cross-task knowledge transfer, where domain adaptation has become a widely adopted technique in fault diagnosis, primarily encompassing metric-based approaches, adversarial frameworks, and their hybrid variants [4, 6]. Currently, cross-domain fault diagnosis methods have been extended to encompass a wider range of diverse and practical application scenarios [7]. Given that source domain data are often more abundant in real-world settings, several studies have proposed multi-source transfer fault diagnosis approaches [8, 9]. For closed-set scenarios, various domain adaptation methods have been developed [10]. Since the label categories between source and target domains may not be completely identical, open-set domain adaptation and partial domain adaptation methods have been developed for fault diagnosis [11].
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Research Report (0.64)
- Overview (0.46)
- Information Technology (0.46)
- Health & Medicine > Diagnostic Medicine (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
- Leisure & Entertainment (0.68)
- Education (0.68)
A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
Iurada, Leonardo, Occhiena, Beatrice, Tommasi, Tatiana
The widespread availability of pre-trained vision models has enabled numerous deep learning applications through their transferable representations. However, their computational and storage costs often limit practical deployment. Pruning-at-Initialization has emerged as a promising approach to compress models before training, enabling efficient task-specific adaptation. While conventional wisdom suggests that effective pruning requires task-specific data, this creates a challenge when downstream tasks are unknown in advance. In this paper, we investigate how data influences the pruning of pre-trained vision models. Surprisingly, pruning on one task retains the model's zero-shot performance also on unseen tasks. Furthermore, fine-tuning these pruned models not only improves performance on original seen tasks but can recover held-out tasks' performance. We attribute this phenomenon to the favorable loss landscapes induced by extensive pre-training on large-scale datasets.
Transferable Deployment of Semantic Edge Inference Systems via Unsupervised Domain Adaption
Jiao, Weiqiang, Bi, Suzhi, Li, Xian, Guo, Cheng, Chen, Hao, Quan, Zhi
--This paper investigates deploying semantic edge inference systems for performing a common image clarification task. In particular, each system consists of multiple Internet of Things (IoT) devices that first locally encode the sensing data into semantic features and then transmit them to an edge server for subsequent data fusion and task inference. The inference accuracy is determined by efficient training of the feature encoder/decoder using labeled data samples. Due to the difference in sensing data and communication channel distributions, deploying the system in a new environment may induce high costs in annotating data labels and re-training the encoder/decoder models. T o achieve cost-effective transferable system deployment, we propose an efficient Domain Adaptation method for Semantic Edge INference systems (DASEIN) that can maintain high inference accuracy in a new environment without the need for labeled samples. Specifically, DASEIN exploits the task-relevant data correlation between different deployment scenarios by leveraging the techniques of unsupervised domain adaptation and knowledge distillation. It devises an efficient two-step adaptation procedure that sequentially aligns the data distributions and adapts to the channel variations. Numerical results show that, under a substantial change in sensing data distributions, the proposed DASEIN outperforms the best-performing benchmark method by 7.09 % and 21.33 % in inference accuracy when the new environment has similar or 25 dB lower channel signal to noise power ratios (SNRs), respectively. This verifies the effectiveness of the proposed method in adapting both data and channel distributions in practical transfer deployment applications. Index T erms --Semantic communications, edge inference, transfer learning, unsupervised domain adaptation. Hanks to the advancement of artificial intelligence (AI), it becomes prevalent in recent years to deploy smart Internet of Things (IoT) systems using deep neural networks (DNNs) to perform complex inference tasks, e.g., computer vision based object recognition [1]-[3]. In particular, wireless IoT devices, such as video surveillance cameras, are systematically deployed at target locations to collect real-time sensing data and collaboratively accomplish specific inference tasks. The performance of on-device AI inference, however, is significantly constrained by the limited battery energy and computing power of IoT devices.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Hawaii (0.04)
- North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
- (5 more...)
Tune It Up: Music Genre Transfer and Prediction
Samet, Fidan, Bakir, Oguz, Fidan, Adnan
Deep generative models have been used in style transfer tasks for images. In this study, we adapt and improve CycleGAN model to perform music style transfer on Jazz and Classic genres. By doing so, we aim to easily generate new songs, cover music to different music genres and reduce the arrangements needed in those processes. We train and use music genre classifier to assess the performance of the transfer models. To that end, we obtain 87.7% accuracy with Multi-layer Perceptron algorithm. To improve our style transfer baseline, we add auxiliary discriminators and triplet loss to our model. According to our experiments, we obtain the best accuracies as 69.4% in Jazz to Classic task and 39.3% in Classic to Jazz task with our developed genre classifier. We also run a subjective experiment and results of it show that the overall performance of our transfer model is good and it manages to conserve melody of inputs on the transferred outputs. Our code is available at https://github.com/ fidansamet/tune-it-up
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)